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I. Introduction 

In this note we obtain precise asymptotics, as n — > oo, for the expected number 
of distinct part sizes in a random composition of an integer n. Let us recall that 
a multiset A = {Ai, . . . , A^} is a partition of an integer n if the Xj are positive 
integers, called parts, such that = n - The values of A/s are called part sizes. 

Compositions are partitions in which the order of parts is significant. Thus, for 
example, the integer 3 admits three partitions, {1,1,1}, {2,1} and {3}, and four 
compositions, namely (1,1,1), (1,2), (2,1) and (3) According to our terminology 
(1, 2) is a composition of 3 in two parts with sizes 1 and 2. In analogy with random 
partitions, by a random composition of an integer n we mean a composition of n 
that is chosen uniformly at random out of the set of all 2 n_1 compositions of an 
integer n. More formally, one considers the probability space consisting of the set 
C(n) of all compositions of n equipped with the uniform probability measure. In 
this setting, the number of distinct part sizes (or other characteristics) becomes a 
random variable whose probabilistic behavior is to be studied. 

Investigation of random partitions from this probabilistic perspective originated 
with a paper by Erdos and Lehner [5] who studied the limiting distribution of the 
total number of parts in a random partition. Subsequently, Wilf [11] found an as- 
ymptotic formula for the expected number of distinct part sizes. Goh and Schmutz 
[7] obtained more precise information on the distribution of the number of distinct 
part sizes, namely they established the central limit theorem. Recently Corteel, 
Pittel, Savage and Wilf [3] obtained a refined version of Wilf's result concerning 
the expectation of the number of distinct part sizes in a random partition. Their 
result allows one to obtain as many terms for the asymptotic expansion of this 
expectation as one wishes. For example, on "o(l) level" this expectation is 

V6n 3 1 , . 

The aim of this note is to obtain an asymptotics for the same quantity in the 
case of random compositions. In order to state our result we need some notation. 
For an integer n consider the set C (n) of all compositions K of n equipped with the 
uniform probability measure P„ = P (that is, P(k) = 2~™ +1 for every k £ C(n).) 
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For a composition k — (7i,...,7fe) the number of distinct part sizes, D n (n) is 
denned by the formula 

k 

D n (n) = 1 +X] 7 {7i^7i> i=i,...,*-i}! 

i=2 

where 7.4 is the indicator function of the set A. We denote the integration with 
respect to P on C{n) by E. We have: 

Theorem. As n — > oo, 

7 3 

ED„ = log 2 n + ^ - - + ,g(log 2 n) + o(l), 

where 7 zs Euler's constant and g is mean-zero function of period 1 satisfying 
\g\ < 0.0000016. 

Thus, the expected number of distinct part sizes in a composition of an inte- 
ger n asymptotically behaves like log 2 n plus a constant plus a small but periodic 
oscillation. This oscillatory behavior, which just a few years ago was considered 
surprising (to say the least) is by now a well documented and acknowledged feature 
of sequences of geometric random variables, see e.g. [2], [4], [8] [10]. 

We wish to observe that the asymptotics for the expected number of distinct 
part sizes is the same as the expected length of the longest run of heads in n tosses 
of a fair coin, see e.g. [2] or [8]. Since the size of the largest part is one plus the 
longest run of heads it follows that on average one expects to see parts of all but 
one sizes between 1 and the largest size (or runs of heads of all but one lengths 
between 1 and the longest run). 

2. OUTLINE OF A PROOF 

Quite often results like this are obtained through careful analysis of the the gener- 
ating function. We will use a different approach. We will view random composition 
as (essentially) randomly stopped sequence of i.i.d. geometric random variables 
and we will express the number of distinct part sizes as a function of this sequence. 
This will allow for direct and straightforward estimates. The same approach was 
used successfully in [9] to handle a problem in which generating function approach 
was apparently futile. We believe that this technique will prove useful in many 
other problems concerning random compositions. Our proof in a natural way splits 
in the two steps. In the first we will use the afore-mentioned representation and 
probabilistic estimates to extract the main contribution to E_D n . Namely, we have 

Proposition 1. As n — ► oo, 

00 1 

^ = £{i-(i-^p +o(1) } + (i). 

m— 1 

The second, purely analytical step is to analyse the asymptotic behavior of the 
infinite sum above. This goal could be accomplished by applying the so - called 
Rice method (see e.g. [6] for a very good description and examples). Since this 
method requires some tools from complex analysis we decided to take a different 
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route. As a result our analysis is completely elementary (thus, making this paper 
fully accessible to advanced undergraduates, for example.) To facilitate our analysis 
we define 

/(*) = £{i-(i-^) 2 l 

m— 1 

With this definition we will show that f(x) tends to a limit as x tends to infinity 
along sequences of the form {xq + k}kcz, but does not possess a unrestricted limit 
as x — > oo. More specifically, we have 

Proposition 2. For large positive k 

f(x + k) = x + k + 7/ In 2 - 1/2 + g{x) + o(2- x - k ) 
where 7 is Euler's constant and 

00 
g{x) = -x- 1 /\n2 + l/2- ]T exp(-2- m+ *) + ]T (1 - cxp(-2- m +*)) 

m— — 00 m— 1 

is a nonconstant, zero-mean function of period 1 satisfying \ g(x) |< .0000016. 
Clearly, Theorem follows by combining Propositions 1 and 2. 

3. PROOF OF PROPOSITION 1 

Central to our approach is the following proposition 

Propositions. LeiTi,^... be i.i.d. geometric random variables with parameter 
1/2 (that is P(Ti = j) = 2~i , j = 1,2...) and define 

t = inf{fc > 1 : ri + r 2 + • • • + r fe > n}. 

Then, the distribution of a randomly chosen composition in C(n) is given by 

T-l 

(r 1 ,r 2 ,...,r T _ 1 ,n-J2 T j)- 

This proposition is nothing more than a reiteration of a known (see e.g. [1]) 
connection between compositions of integers and {0, 1} - valued sequences. Namely, 
a composition k — (71, . . . ,7^) of an integer n into parts 71, . . . ,7fc is associated 
with a string of 0's and l's of length n as follows: there is a 1 on the nth place 
and the numbers 71, ... ,7k are "waiting times" for the first, second,. . . , and fcth 
appearance of 1. (For example, the composition (1, 2, 3, 1, 1) of 8 corresponds to the 
string 10100111 while (4, 2, 2) corresponds to 00010101.) Choosing a composition at 
random amounts to having the 0's and l's on the first n — 1 places occur according 
to a binomial Bin(n — 1, 1/2) law. We refer to [9] for more details. Let T^k) denote 
parts of a randomly chosen composition k, i.e. 

t(«)-1 

fj(/t) = Ti(K), for i < t(k) and f T ( re )(/c)=n- I\(k). 

i=l 
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Note that T T <T T . The expected value of D n is computed as follows: 

T 

EA^l + E^T/^^ 

T-l 

= 1 + E^ / ri#r .. ,-=!,...,(_! + P(f T ^ Ti, . . . , r T _i). 

We will first show that the last probability is negligible. This is because r being a 
f + Bin(ra — 1, 1/2) random variable satisfies the bound 

P(|r - Er| > f) < 2 cxp(-2t 2 /(« - 1)), 

so that 



P(|r-Er| > J(n-l)\ogn) < 2cxp{-2 ( " 1)logn } = 0(1/ 

n — 1 

Therefore, letting t n = y/(n — 1) logn and then 

n ~ Er - t„ = (n + l)/2 - - 1) logn, 



n 2 ). 



and 



ni - Er + i„ = (n + l)/2 + >/(n - l)logn, 

we get 

n 

p(f T ^ r l7 . . . , r T _o < £ p ( f r = i, r 1; . . . , r T _! ± j) 

3=1 

n 

<^P(f r >j, T 1 ,...,T T _ 1 ^j) 

3=1 

n 

<^P(r r >j, r 1 ,...,r T _ 19 6j) 

n 

<^p(r r >j, ri,...,r T _i 9 4j, \t-e t \ < t n ) + n¥(\ T -e t \ >t n ) 
-EE P(r fc > j,ri,...,r fc _ 19 6 j, r = fc) + o(i/n) 

ni oo 

< J2 £p(r fc >i,ri,...,r fc _ 19 4i) + o(i/n) 
= EE^r( 1 -^)*" 1 + o(V») 

fc=n J=l 
™l /.oo 



"1 /"OO -| i 

fe=n " 



=«() 
ni 



k—n 



as n — > oo. As for the other term, we have 

T-l T-l 

E^/r s #r 3 , j<i < E(^7r 4 ^, i<i) 7 |r-Er|<t„ 

i=2 

T-l 

+ E(^/ ri #r 3 , j<i) 7 |r-Er|>t„ 
i=2 

T-l 

< E( ^ Jr,*r„ ,<0 7 |r-Er|<t n + (« - 2)P(|r - Er| > 



j=2 i=2 

T-l 



i=2 



The second term is bounded above by C/n and, of course, tends to as n — > oo. 
For the first one we have: 

t— 1 ni 

E (X] 7r ^ r ^ i<i) 7 |r-Er|<t n < Jr 4 ^, j<i)l\T-Er\<t n 

i=2 i=2 

ni ni 

< E^/r s# r 3 , i<i - ^P(r 4 ± Tj, j < i) 

i=2 i=2 

Similarly, 

T-l n 

E (X] 7r »^' J<i) 7 k-ET|<t„ > E(^/ r ,#r 3 , i<i) 7 |r-ET|<t„ 



i=2 i=2 
n n 



<Y,h -i . j<i -E^E(r- ^ r,, j < i)/| T _Eri>* n 

i=2 i=2 
no 

>£>( r i^ r j. j < i) — nP(|r — Et| > t„) 

i=2 
wo 

= ^p(r i ^r j , ^o-ocn- 1 ) 



»=2 

We will now fix k and approximate X^i=2 ^(^i 7^ Fj, j < i) as follows 

x 1 | ( | 

P(I\ ± Tj, j < i) = £ m = m, Tj ? m; j < i) = £ — (l - — )' . 

m— 1 m— 1 

Hence, by summing up over z we get: 

k oo i * ■ -t oo fe— 1 



yy-(i--) =y-y(i--Y 

i—2 rn—l ~ m— 1 i=l 

- V 1 fi 1 1 ~ ^ ~ 2 ~ m ) fe ~ 1 

~ 2 m V 2 m / 1 - (1 - 2~ m ) 

m=l v 7 

~~ 51 ~~ 2™) I 1 _ _ 2™) ) ~~ ^ ~~ 2™^ ~ 2^) } 

m=l m=l 

oo _^ oc _^ oo _^ 

^ ^ 2 m ^ } ^ ' 2™ ^ ' "(^ ^ 2 m ^ 1 ^ 



m=l ~~ m=l m=l 
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It follows that after ignoring terms of order o(l) we have 

f:{i-(i-^r}<vD n <f:{i-(i-±ry 

m—l m—l 

since both no and m are of the form f (1 + o(l)) Proposition 1 follows. 

4. PROOF OF PROPOSITION 2 

Recall that 

oo 

/(*) = £{i-(i-^) 2 l 

m—l 

Throughout the proof we can suppose that < x < 1. We first give a simple 
argument which gives the limiting behavior of f(x) without, however, yielding an 
estimate for the rate of convergence. We re-index the sum by m + k to obtain 

oo 

f(k + x)-k-x = -x- Yl (1 - 2- m - fe ) 2fe+a; + ]T(1- (l-2- m - k f + *). 

m= — fe+1 m=l 

Permuting summation and limits as k — > oo yields 

oo 

f(x + k)-k-x = -x- Y exp(-2- m+a; ) + -cxp(-2- m+2: )) +o(l). 

m— — oo m—l 

But this step is justified by dominated convergence using the following majorizing 
convergent series of positive terms independent of k: 

o o 

(1 - 2- m - k f + * « cxp(-2- m+x ) 

m= — k+1 rra= — oo 

and 

oo oo 

J2 (! - (! - 2-" l - fe ) 2fc+a; ) < £ (1 - cxp(-2-" l+a:+1 )). 

m—l m—l 

These follow from the estimates exp(— 2a6) < (1 — 6/A) aA < exp(— a6) if Aa > 
and 6/A < 1/2 with A = 2 k ,a = 2 X and b = 2~ m . 

The series thus established as the limit of f(k + x) — (k + x) defines a function 
of period 1. Denoting its mean by c and its zero-mean part by g(x) we have 

f(x + k) = x + k + c + g(x) + o(l) = x + k + c + g(x + k) + o(l). 

To obtain the finer estimate stated in the proposition we use the higher order 
estimate exp(-afe - ab 2 /X) < (1 - b/\) aX if Aa > and b/X < 1/2. Wc must bound 

-k 

f(x + k)-(x + c + g(x))= cxp(-2- m + x ) 

m— — oo 

oo 

+ {exp(-2- m +*)-(l-2- m - fc ) 2fc+x }. 

m— — k+1 
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The first sum can be rewritten as 

oo oo 

exp(-2 m+k+x ) = cxp (-2 k+x ) exp (-(2 m - l)2 k+x ) 

rn—0 m—0 

which is bounded by 

oo 

exp (-2 k+x ) cxp (-(2 m - 1)) 

m=0 

and thus makes an exponentially small contribution to an error term of 0(2~ k - x ). 
The second sum consists of positive terms and is bounded above by 

oo 

{exp (-2- m+x ) - cxp (-2- m+x - 2 - 2m+x - k )}. 

m— — oo 

Then the inequality exp (—a) — exp (—a — b) < b exp (—a) for positive a and b gives 
the bound 

oo 

2 -x-k ^ 2- 2m+2x cxp(-2-3 +x ). 

m— — oo 

This bound has the form 2~ x ~ k h(x) where h is a periodic function of x and is 
therefore bounded by a constant. This establishes the asserted rate of convergence. 

It remains to calculate the mean of g. The mean of —x is —1/2 and the mean 
of the residual scries is 



co 



„i oo „i 

V / cxp{-2- ,n+x )dx +Y (1 - cxp(-2- m+x ))dx. 



On the m-th summand the change of variable u = 2 m+x gives 

r 2~ m+1 j 00 r 2~ m+1 



ln2 = - V f exp(-u)— + ^ / (l-exp(-u)) — 



or 

i « /"°° / \du f 1 , »»du 

c m2 = — / exp(— u) h / (1 — cxp(— u)) — . 

Ji u J u 

Integrating each integral by parts yields a single integral 



/•oo 

/ exp(— u) hiudu 
Jo 



which is a well-known integral representing Euler's constant. 

We remark that a little more similar reasoning shows that the periodic function 



h(x)= J2 2- 2m+2x cxp(-2- m+x ) 



m— — oo 
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appearing in the rate estimate overestimates the error asymptotically by a factor 
of two and, in fact, 

f(x + k) = x + k + 7/ In 2 - 1/2 + g(x) + h{x)2~ x - k - 1 + 0(2- 2x - 2k ). 

The bound on g and its nonconstant character are easily checked numerically 
although calculations dealing with / rather than analytically derived asymptotic 
forms are rather sensitive. Alternately its complex Fourier coefficients are eas- 
ily obtained by a simple variant of the calculation of c and have the form = 
r(27rfci/ln2)/ln2. These are nonzero, small and decrease geometrically in magni- 
tude. For example 2 | ci |= .00000157316 accounts for the maximum contribution 
of the first harmonic while for the second harmonic 2 | C2 |< 10~ 12 . 
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